Menu Top
Latest Geography NCERT Notes, Solutions and Extra Q & A (Class 8th to 12th)
8th 9th 10th 11th 12th

Class 12th Chapters
Fundamentals of Human Geography
1. Human Geography - Nature And Scope 2. The World Population - Distribution, Density And Growth 3. Human Development
4. Primary Activities 5. Secondary Activities 6. Tertiary And Quaternary Activities
7. Transport And Communication 8. International Trade
India - People and Economy
1. Population : Distribution, Density, Growth And Composition 2. Human Settlements 3. Land Resources And Agriculture
4. Water Resources 5. Mineral And Energy Resources 6. Planning And Sustainable Development In Indian Context
7. Transport And Communication 8. International Trade 9. Geographical Perspective On Selected Issues And Problems
Practical Work in Geography
1. Data – Its Source And Compilation 2. Data Processing 3. Graphical Representation Of Data
4. Spatial Information Technology



Chapter 1 Data–Its Source And Compilation



Introduction

Data is omnipresent in our daily lives and crucial for understanding various phenomena. We encounter data in many forms, from weather reports to geographical information presented in maps and tables detailing population, production, and trade.

This chapter delves into what data is, why it is needed, how it is obtained, and how it is processed and presented to extract meaningful information, helping us answer questions about patterns and distributions in the world.


Definition Of Data

Data can be defined as numbers or characters that represent measurements or observations from the real world. A single observation or measurement is called a datum. Examples include numerical facts like rainfall measurements or distances between places.

While a large volume of raw data might be available, it is often difficult to draw meaningful conclusions directly from this unprocessed form. For data to become useful, it needs to be processed to derive logical conclusions or statistically calculated information. Processed data, which provides a meaningful answer to a query or stimulates further inquiry, is called information.



Need Of Data

Data is essential for geographical analysis and understanding the world. Maps are primary tools, but data in tabular form is also vital for explaining the distribution and growth of geographical phenomena.

Geography studies the interrelationships between various phenomena on the Earth's surface. These interactions are influenced by many variables that can often be best understood and analysed in quantitative terms. Therefore, statistical analysis of data has become a necessity in modern geographical studies.

For instance, studying agriculture in an area requires statistical data on cropped area, yield, production, irrigation, rainfall, and inputs like fertilizers and pesticides. Similarly, analysing urban growth necessitates data on population size, density, migration, occupation, income, industries, and transport/communication networks. Thus, data provides the empirical basis for geographical analysis and understanding complex relationships.



Presentation Of The Data

Collecting data is important, but presenting it effectively is equally crucial to avoid misinterpretations and derive accurate insights. Raw data can be misleading if not properly analysed and presented.

Statistical Fallacy Example:

A man and his family (wife and 5-year-old child) needed to cross a river. He measured the depth at four points: 0.6m, 0.8m, 0.9m, and 1.5m. He calculated the average depth as 0.95m. Since his child was 1m tall, he thought it was safe for the child, but the child drowned.

Answer:

This story illustrates a statistical fallacy. While the average depth (0.95m) was less than the child's height (1m), the average does not represent the depth at every point. The river was 1.5m deep at one point, which was deeper than the child's height, leading to the tragic outcome. This demonstrates that relying solely on averages or raw data without considering the distribution and variation can be dangerously misleading.

The increasing use of data across disciplines, including geography, highlights the shift from purely qualitative descriptions to more quantitative analysis. Using statistical methods for data analysis, presentation, and interpretation is essential for making studies more logical, deriving precise conclusions, and accurately explaining variations and relationships between phenomena over space and time.



Sources Of Data

Data can be obtained from different sources, broadly categorised based on how the data is originally collected.


Primary Sources

These are sources where data is collected for the first time by an individual, group, institution, or organisation for a specific purpose.


Secondary Sources

These are sources where data is collected from existing published or unpublished records that were originally compiled by someone else for another purpose.

Figure 1.1 provides a diagrammatic representation of the different methods for collecting data, categorized into primary and secondary sources.

Flowchart showing methods of data collection, classified into Primary and Secondary Data, with their respective sub-methods


Sources Of Primary Data

Primary data is collected directly from the field or source. Several methods are used to gather primary data:


Personal Observations

This involves direct observation of phenomena in the field by a researcher or group. Through field surveys, geographers can gather information about physical features (terrain, drainage, soil, vegetation) and human aspects (population structure, settlements, transport). For unbiased and accurate personal observations, the observer needs relevant theoretical knowledge and a scientific, objective approach.


Interview

In the interview method, researchers directly interact with respondents through dialogue and conversation to obtain information. To conduct an effective interview, the researcher should:

  • Prepare a clear list of questions or topics beforehand.
  • Be fully aware of the survey's objectives.
  • Build rapport with the respondent, especially for sensitive questions, ensuring confidentiality.
  • Create a comfortable and encouraging atmosphere.
  • Use simple, polite language.
  • Avoid questions that might offend or hurt the respondent.
  • Ask if the respondent has any additional relevant information at the end.
  • Express gratitude for their time.


Questionnaire/Schedule

This method uses a set of written questions. In a questionnaire, questions are provided, often with multiple-choice answers to be ticked, or space for written responses. The objectives of the survey should be stated. Questionnaires are useful for surveying large areas and can be mailed. However, they are suitable only for literate respondents. A schedule is similar to a questionnaire, but it is filled out by a trained enumerator who asks the questions to the respondent. The advantage of a schedule is that it can be used to collect data from both literate and illiterate individuals.


Other Methods

Primary data can also be collected using specialized tools and techniques. For example, soil and water characteristics can be measured directly in the field using soil kits and water quality kits. Field scientists might use transducers or other instruments to collect data on crop health or vegetation characteristics.

Field scientist measuring crop health with a device


Secondary Source Of Data

Secondary data is information that has already been collected and compiled by others. It exists in various published and unpublished forms.


Published Sources

These are records that have been officially published and made available to the public.


Government Publications

Publications from various government ministries, departments (at central, state, and district levels) are crucial secondary data sources. Examples include the Census of India (Registrar General of India), reports from the National Sample Survey (NSSO), Weather Reports (India Meteorological Department), State Statistical Abstracts, and periodical reports from various Commissions.

Stack of government publications

Semi/Quasi-Government Publications

Publications and reports from organisations that are not fully government departments but have a public function. This includes Urban Development Authorities, Municipal Corporations, Zila Parishads, etc.


International Publications

Yearbooks, reports, and monographs published by international organisations, particularly agencies of the United Nations, are valuable global data sources. Examples include publications from UNESCO, UNDP, WHO, and FAO. Periodical UN publications include the Demographic Year Book, Statistical Year Book, and the Human Development Report.

Stack of United Nations publications

Private Publications

Yearbooks, surveys, research reports, and monographs published by private companies, research institutions, and newspapers also serve as secondary data sources.


Newspapers And Magazines

Daily newspapers and various periodicals (weekly, fortnightly, monthly magazines) are easily accessible sources for a wide range of secondary data, including statistics, reports, and survey findings.


Electronic Media

Modern electronic media, especially the Internet, has become a major repository and source of secondary data. Websites of government agencies, international organisations, research institutions, news archives, and other entities provide vast amounts of readily accessible data.


Unpublished Sources

Data also exists in records that have not been formally published but are maintained by organisations.


Government Documents

Unpublished reports, documents, and records maintained at different government levels (e.g., village-level revenue records kept by the patwari) serve as important secondary sources for detailed information.


Quasi-Government Records

Periodical reports, development plans, and records maintained by Municipal Corporations, District Councils, and public utility departments are included in this category of unpublished records.


Private Documents

Unpublished reports and records held by private companies, trade unions, political and other organisations, and residents' welfare associations are also potential sources of secondary data.



Tabulation And Classification Of Data

Raw data, whether from primary or secondary sources, is often a disordered collection of information that is difficult to understand. To make it usable and enable meaningful analysis, raw data needs to be organised through tabulation and classification.


Statistical Table

A Statistical Table is a fundamental tool for summarising and presenting data. It involves systematically arranging data into columns and rows. The purpose is to simplify presentation, facilitate comparisons between different data points, and allow readers to quickly locate specific information. Tables help organise a large volume of data efficiently within a limited space.



Data Compilation And Presentation

Data, after collection and tabulation, can be presented in various forms: as absolute values, percentages or ratios, or as index numbers.


Absolute Data

When data is presented in its original, raw numerical form (integers), it is called absolute data or raw data. Examples include total population counts, total production figures, etc. Table 1.1 shows the absolute population figures for India and selected states/UTs in 2011.

State/ UT Code India/State/ Union Territory Total Population Persons Males Females
INDIA 1,21,05,69,573 62,31,21,843 58,74,47,730
1 Jammu and Kashmir 1,25,41,302 66,40,662 59,00,640
2 Himachal Pradesh 68,64,602 34,81,873 33,82,729
3 Punjab 2,77,43,338 1,46,39,465 1,31,03,873
4 Chandigarh 10,55,450 5,80,663 4,74,787
5 Uttarakhand 1,00,86,292 51,37,773 49,48,519
6 Haryana 2,53,51,462 1,34,94,734 1,18,56,728
7 National Capital Territory of Delhi 1,67,87,941 89,87,326 78,00,615
8 Rajasthan 6,85,48,437 3,55,50,997 3,29,97,440
9 Uttar Pradesh 19,98,12,341 10,44,80,510 9,53,31,831
10 Bihar 10,40,99,452 5,42,78,157 4,98,21,295

Percentage/Ratio

Data is often presented as percentages or ratios, which are calculated by comparing data to a common parameter. This allows for easier comparison and analysis of proportions or rates. Examples include literacy rates, population growth rates, or percentage of workers in different sectors.

Table 1.2 shows India's literacy rates over decades in percentage form. Literacy rate is calculated as:

$ \text{Literacy Rate} (\%) = \frac{\text{Total Number of Literates}}{\text{Total Population}} \times 100 $

Year Person Male Female
1951 18.33 27.16 8.86
1961 28.3 40.4 15.35
1971 34.45 45.96 21.97
1981 43.57 56.38 29.76
1991 52.21 64.13 39.29
2001 64.84 75.85 54.16
2011 73.0 80.9 64.6

Index Number

An index number is a statistical measure used to show changes in a variable or group of variables relative to a base period, location, or characteristic. Index numbers are useful for comparing changes over time or differences between entities, widely used in economics to track changes in price or quantity.

A common method for calculating index numbers is the simple aggregate method, using the formula:

$ \text{Index Number} = \frac{\sum q_1}{\sum q_0} \times 100 $

Where $ \sum q_1 $ is the sum of current year values and $ \sum q_0 $ is the sum of base year values. The base year value is typically set to 100 for ease of comparison.

Example: Index Number Calculation for Iron Ore Production

Using 1970-71 as the base year, calculate the index numbers for iron ore production in India over different years based on Table 1.3.

Answer:

Year Production (in million tonnes) Calculation Index Number
1970-71 32.5 $\frac{32.5}{32.5} \times 100$ 100
1980-81 42.2 $\frac{42.2}{32.5} \times 100$ 130
1990-91 53.7 $\frac{53.7}{32.5} \times 100$ 165
2000-01 67.4 $\frac{67.4}{32.5} \times 100$ 207

The index numbers show the change in iron ore production relative to the 1970-71 level (index 100). For example, in 2000-01, production was 207% of the 1970-71 level, indicating a more than doubling of production.


Processing Of Data

Raw data is often presented as a large, unstructured set of figures (ungrouped data), making it difficult to interpret. Processing raw data involves organizing it into a more understandable form, primarily through tabulation and classification into groups or classes.

Table 1.4 shows an example of raw, ungrouped data (scores of 60 students). The next step in processing is to group this data.

47 02 39 64 22 46 28 02 09 10
89 96 74 06 26 15 92 84 84 90
32 22 53 62 73 57 37 44 67 50
18 51 36 58 28 65 63 59 75 70
56 58 43 74 64 12 35 42 68 80
64 37 17 31 41 71 56 83 59 90

Grouping Of Data

Grouping involves determining the number of classes the data will be divided into and the size of the class interval for each group. The choice depends on the range of the raw data (the difference between the highest and lowest values). For the data in Table 1.4 (range 02 to 96), ten classes with an interval of ten units (0-10, 10-20, etc.) is a convenient grouping.


Process Of Classification

Once classes and intervals are set, the raw data is classified by assigning each value to its appropriate group. This is commonly done using the Four and Cross Method or tally marks. A tally mark is made next to the corresponding class for each data point. After four marks, the fifth is represented by a cross line, making it easy to count in groups of five.

Group Numerical of Raw Data Tally Marks Number of Individual
0-10 02,02,09,06 IIII 4
10-20 10,15,18,12,17 IIII 5
20-30 22,28,26,22,28 IIII 5
30-40 39,32,37,36,35,37,31 IIII II 7
40-50 47,46,44,43,42,41 IIII I 6
50-60 53,57,50,51,58, 59,56,58,56,59 IIII IIII 10
60-70 64,62,67,65, 63,64,68,64 IIII III 8
70-80 74,73,75,70,74,71 IIII I 6
80-90 89,84,84,80,83 IIII 5
90-100 96,92,90,90 IIII 4
$\sum f = N$ 60

Frequency Distribution

The result of classifying raw data into groups is a frequency distribution. It shows how often each value or range of values (class) appears in the dataset. This distribution is presented in a frequency table.


Simple Frequencies

The number of observations falling within each specific class is called its simple frequency, denoted by 'f'. The sum of all simple frequencies across all classes equals the total number of observations in the dataset, denoted by N ($\sum f = N$). Table 1.6 shows the simple frequencies for the student scores.

Group f Cf
00-10 4 4
10-20 5 9
20-30 5 14
30-40 7 21
40-50 6 27
50-60 10 37
60-70 8 45
70-80 6 51
80-90 5 56
90-100 4 60
$\sum f = N$ 60

Cumulative Frequencies

Cumulative frequency (Cf) for a class is obtained by adding the simple frequency of that class to the cumulative frequency of the preceding class. The cumulative frequency of the first class is its simple frequency. The last cumulative frequency equals the total number of observations (N or $\sum f$). Table 1.6 shows cumulative frequencies. Cumulative frequencies are useful for quickly determining the number of observations below or above a certain value (e.g., how many students scored less than 50).

When forming classes for frequency distribution, either the exclusive or inclusive method is used.


Exclusive Method

In the exclusive method (shown in Table 1.6), the upper limit of a class is the same as the lower limit of the next class (e.g., 20-30, 30-40). An observation with a value equal to the shared limit (e.g., 30) is included in the class where it appears as the lower limit (30-40) and excluded from the class where it appears as the upper limit (20-30). Thus, a class interval like 20-30 includes values from 20 up to (but not including) 30.


Inclusive Method

In the inclusive method (shown in Table 1.7), the upper limit of a class is included within that class (e.g., 0-9, 10-19). The upper limit of one class is typically one less than the lower limit of the next class. A class interval like 0-9 includes values from 0 up to and including 9. Both the upper and lower limits are included in counting frequencies for that class.

Group f Cf
0 – 9 4 4
10 – 19 5 9
20 – 29 5 14
30 – 39 7 21
40 – 49 6 27
50 – 59 10 37
60 – 69 8 45
70 – 79 6 51
80 – 89 5 56
90 – 99 4 60
$\sum f = N$ 60

Frequency Polygon

A frequency polygon is a graph that visually represents a frequency distribution. It is created by plotting points representing the frequencies of each class against the midpoints of the class intervals and connecting these points with straight lines. Frequency polygons are useful for comparing two or more frequency distributions on the same graph.

Graph showing a frequency distribution represented as a polygon

Ogive

An Ogive (pronounced as 'ojive') is a graph of a cumulative frequency distribution. It is constructed by plotting cumulative frequencies against the class boundaries.

There are two types of Ogives:

  • Less than Ogive: Constructed using the upper limits of the classes and the 'less than' cumulative frequencies. The cumulative frequencies are calculated by adding simple frequencies from the bottom up. When plotted, this results in a rising curve (Table 1.8, Fig. 1.6).
  • More than Ogive: Constructed using the lower limits of the classes and the 'more than' cumulative frequencies. The cumulative frequencies are calculated by subtracting simple frequencies from the total (N) starting from the top. When plotted, this results in a declining curve (Table 1.9, Fig. 1.7).

Plotting both the less than and more than Ogives on the same graph provides a comprehensive visual representation of the cumulative distribution (Table 1.10, Fig. 1.8).

Less than Method Cf
Less than 10 4
Less than 20 9
Less than 30 14
Less than 40 21
Less than 50 27
Less than 60 37
Less than 70 45
Less than 80 51
Less than 90 56
Less than 100 60

Graph showing a less than Ogive (rising curve)

More than Method Cf
More than 0 60
More than 10 56
More than 20 51
More than 30 44
More than 40 38
More than 50 28
More than 60 20
More than 70 14
More than 80 9
More than 90 4

Graph showing a more than Ogive (declining curve)

Marks obtained Less than More than
0 - 10 4 60
10 - 20 9 56
20 - 30 14 51
30 - 40 21 44
30 - 40 27 38
50 - 60 37 28
60 - 70 45 20
70 - 80 51 14
80 - 90 56 9
90 - 100 60 4

Graph showing both less than and more than Ogives